378 research outputs found
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
Robust face detection in the wild is one of the ultimate components to
support various facial related problems, i.e. unconstrained face recognition,
facial periocular recognition, facial landmarking and pose estimation, facial
expression recognition, 3D facial model construction, etc. Although the face
detection problem has been intensely studied for decades with various
commercial applications, it still meets problems in some real-world scenarios
due to numerous challenges, e.g. heavy facial occlusions, extremely low
resolutions, strong illumination, exceptionally pose variations, image or video
compression artifacts, etc. In this paper, we present a face detection approach
named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN)
to robustly solve the problems mentioned above. Similar to the region-based
CNNs, our proposed network consists of the region proposal component and the
region-of-interest (RoI) detection component. However, far apart of that
network, there are two main contributions in our proposed network that play a
significant role to achieve the state-of-the-art performance in face detection.
Firstly, the multi-scale information is grouped both in region proposal and RoI
detection to deal with tiny face regions. Secondly, our proposed network allows
explicit body contextual reasoning in the network inspired from the intuition
of human vision system. The proposed approach is benchmarked on two recent
challenging face detection databases, i.e. the WIDER FACE Dataset which
contains high degree of variability, as well as the Face Detection Dataset and
Benchmark (FDDB). The experimental results show that our proposed approach
trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE
Dataset by a large margin, and consistently achieves competitive results on
FDDB against the recent state-of-the-art face detection methods
Towards a reliable face recognition system.
Face Recognition (FR) is an important area in computer vision with many applications such as security and automated border controls. The recent advancements in this domain have pushed the performance of models to human-level accuracy. However, the varying conditions in the real-world expose more challenges for their adoption. In this paper, we investigate the performance of these models. We analyze the performance of a cross-section of face detection and recognition models. Experiments were carried out without any preprocessing on three state-of-the-art face detection methods namely HOG, YOLO and MTCNN, and three recognition models namely, VGGface2, FaceNet and Arcface. Our results indicated that there is a significant reliance by these methods on preprocessing for optimum performance
Multi-view Face Detection Using Deep Convolutional Neural Networks
In this paper we consider the problem of multi-view face detection. While
there has been significant research on this problem, current state-of-the-art
approaches for this task require annotation of facial landmarks, e.g. TSM [25],
or annotation of face poses [28, 22]. They also require training dozens of
models to fully capture faces in all orientations, e.g. 22 models in HeadHunter
method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method
that does not require pose/landmark annotation and is able to detect faces in a
wide range of orientations using a single model based on deep convolutional
neural networks. The proposed method has minimal complexity; unlike other
recent deep learning object detection methods [9], it does not require
additional components such as segmentation, bounding-box regression, or SVM
classifiers. Furthermore, we analyzed scores of the proposed face detector for
faces in different orientations and found that 1) the proposed method is able
to detect faces from different angles and can handle occlusion to some extent,
2) there seems to be a correlation between dis- tribution of positive examples
in the training set and scores of the proposed face detector. The latter
suggests that the proposed methods performance can be further improved by using
better sampling strategies and more sophisticated data augmentation techniques.
Evaluations on popular face detection benchmark datasets show that our
single-model face detector algorithm has similar or better performance compared
to the previous methods, which are more complex and require annotations of
either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR
Stacking-fault energies for Ag, Cu, and Ni from empirical tight-binding potentials
The intrinsic stacking-fault energies and free energies for Ag, Cu, and Ni
are derived from molecular-dynamics simulations using the empirical
tight-binding potentials of Cleri and Rosato [Phys. Rev. B 48, 22 (1993)].
While the results show significant deviations from experimental data, the
general trend between the elements remains correct. This allows to use the
potentials for qualitative comparisons between metals with high and low
stacking-fault energies. Moreover, the effect of stacking faults on the local
vibrational properties near the fault is examined. It turns out that the
stacking fault has the strongest effect on modes in the center of the
transverse peak and its effect is localized in a region of approximately eight
monolayers around the defect.Comment: 5 pages, 2 figures, accepted for publication in Phys. Rev.
Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution
Given a set of images containing objects from the same category, the task of
image co-localization is to identify and localize each instance. This paper
shows that this problem can be solved by a simple but intriguing idea, that is,
a common object detector can be learnt by making its detection confidence
scores distributed like those of a strongly supervised detector. More
specifically, we observe that given a set of object proposals extracted from an
image that contains the object of interest, an accurate strongly supervised
object detector should give high scores to only a small minority of proposals,
and low scores to most of them. Thus, we devise an entropy-based objective
function to enforce the above property when learning the common object
detector. Once the detector is learnt, we resort to a segmentation approach to
refine the localization. We show that despite its simplicity, our approach
outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201
Development of a tight-binding potential for bcc-Zr. Application to the study of vibrational properties
We present a tight-binding potential based on the moment expansion of the
density of states, which includes up to the fifth moment. The potential is
fitted to bcc and hcp Zr and it is applied to the computation of vibrational
properties of bcc-Zr. In particular, we compute the isothermal elastic
constants in the temperature range 1200K < T < 2000K by means of standard Monte
Carlo simulation techniques. The agreement with experimental results is
satisfactory, especially in the case of the stability of the lattice with
respect to the shear associated with C'. However, the temperature decrease of
the Cauchy pressure is not reproduced. The T=0K phonon frequencies of bcc-Zr
are also computed. The potential predicts several instabilities of the bcc
structure, and a crossing of the longitudinal and transverse modes in the (001)
direction. This is in agreement with recent ab initio calculations in Sc, Ti,
Hf, and La.Comment: 14 pages, 6 tables, 4 figures, revtex; the kinetic term of the
isothermal elastic constants has been corrected (Eq. (4.1), Table VI and
Figure 4
A novel infrared video surveillance system using deep learning based techniques
This is the author accepted manuscript. The final version is available from Springer via the DOI in this record.This paper presents a new, practical infrared video based surveillance
system, consisting of a resolution-enhanced, automatic target detection/recognition
(ATD/R) system that is widely applicable in civilian and military applications. To
deal with the issue of small numbers of pixel on target in the developed ATD/R
system, as are encountered in long range imagery, a super-resolution method is
employed to increase target signature resolution and optimise the baseline quality
of inputs for object recognition. To tackle the challenge of detecting extremely
low-resolution targets, we train a sophisticated and powerful convolutional neural
network (CNN) based faster-RCNN using long wave infrared imagery datasets
that were prepared and marked in-house. The system was tested under different
weather conditions, using two datasets featuring target types comprising pedestrians
and 6 different types of ground vehicles. The developed ATD/R system can
detect extremely low-resolution targets with superior performance by effectively
addressing the low small number of pixels on target, encountered in long range applications.
A comparison with traditional methods confirms this superiority both
qualitatively and quantitativelyThis work was funded by Thales UK, the Centre of Excellence for
Sensor and Imaging System (CENSIS), and the Scottish Funding Council under the project
“AALART. Thales-Challenge Low-pixel Automatic Target Detection and Recognition (ATD/ATR)”,
ref. CAF-0036. Thanks are also given to the Digital Health and Care Institute (DHI, project
Smartcough-MacMasters), which partially supported Mr. Monge-Alvarez’s contribution, and
to the Royal Society of Edinburgh and National Science Foundation of China for the funding
associated to the project “Flood Detection and Monitoring using Hyperspectral Remote Sensing
from Unmanned Aerial Vehicles”, which partially covered Dr. Casaseca-de-la-Higuera’s,
Dr. Luo’s, and Prof. Wang’s contribution. Dr. Casaseca-de-la-Higuera would also like to acknowledge
the Royal Society of Edinburgh for the funding associated to project “HIVE”
Probabilistic Computation in Human Perception under Variability in Encoding Precision
A key function of the brain is to interpret noisy sensory information. To do so optimally, observers must, in many tasks, take into account knowledge of the precision with which stimuli are encoded. In an orientation change detection task, we find that encoding precision does not only depend on an experimentally controlled reliability parameter (shape), but also exhibits additional variability. In spite of variability in precision, human subjects seem to take into account precision near-optimally on a trial-to-trial and item-to-item basis. Our results offer a new conceptualization of the encoding of sensory information and highlight the brain’s remarkable ability to incorporate knowledge of uncertainty during complex perceptual decision-making
- …